{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "private_outputs": true,
      "provenance": [],
      "collapsed_sections": [
        "Cb00VhNkr4PB",
        "ASJC2S6kHBzX",
        "qy7RhL46JEZx"
      ],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU",
    "gpuClass": "standard"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/ardha27/AI-Song-Cover-RVC/blob/main/Training_V2_and_Youtube_Audio_Download_%26_Splitting_Audio_combined.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **RVC Training and Youtube Audio Download & Splitting Audio - All in One**\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "RVC Training V2 and Youtube Audio Download & Splitting Audio script by Ardha<br>\n",
        "Combined by Minato Isuki<br>\n",
        "<br>\n",
        "Changelog (15:12 GMT+7 | 10/9): Apply weight asset directory commit from Ardha\n",
        "\n",
        "---\n",
        "\n",
        "<br>*If there is any issue within the code (or any typo), please reply to my pull request or create a issue then mention my pull request.*"
      ],
      "metadata": {
        "id": "2FOERx5gmcWz"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Please input all information in these two input form before executing \"Run All\".**<br>\n",
        "**If your audio is in Google Drive, please execute the mount Google Drive script at Step 2.1 before filling the form.**"
      ],
      "metadata": {
        "id": "6Kex44S5iBk0"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### **How to get Google Drive path**\n",
        "*Click the arrow to show instructions*"
      ],
      "metadata": {
        "id": "Cb00VhNkr4PB"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "[Video instructions](https://img.guildedcdn.com/ContentMediaGenericFiles/f380a54fb1065b2f12dad7b4ba09211e-Full.mp4?w=1280&h=720)<br>\n",
        "*Image instructions will be created (or not, Colab uses base64 which lags my PC pretty bad).*"
      ],
      "metadata": {
        "id": "qtsESLAys-Ia"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### **Step 1: Filling form (For audio download and for RVC)**"
      ],
      "metadata": {
        "id": "szqqFpaqsEwh"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#@title Download Youtube Audio - Input Form\n",
        "\n",
        "#@markdown If you are making a new model (or getting new audio for a model), use Splitting mode, else, use Seperate mode for music seperation.\n",
        "AudioMode = \"Separate\" #@param [\"Separate\", \"Splitting\"]\n",
        "#@markdown Where did you store your audio/music at?\n",
        "dataset = \"Youtube\" #@param [\"Youtube\", \"Drive\"]\n",
        "#@markdown If it's YouTube, enter the URL here.\n",
        "url = \"\" #@param {type:\"string\"}\n",
        "#@markdown If it's Google Drive, start the mount Google Drive script in Step 2.1 then enter the path here. The path should begin with /content/drive/MyDrive.\n",
        "path_audio = \"\" #@param {type:\"string\"}\n",
        "#@markdown Enter the music name or the sound model here.\n",
        "AUDIO_NAME = \"\" #@param {type:\"string\"}"
      ],
      "metadata": {
        "id": "Tf9MUvvfPt8v",
        "cellView": "form"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title RVC Training - Input Form\n",
        "#@markdown Select \"inference\" to mess with your AI model (maybe making an AI cover).<br>\n",
        "#@markdown Select \"training\" if you are going to make a new model or continue training.\n",
        "RVCMode = \"inference\" #@param [\"inference\", \"training\"]\n",
        "#@markdown Enter your AI model name to inference/training\n",
        "MODELNAME = \"kanna\"  #@param {type:\"string\"}"
      ],
      "metadata": {
        "id": "KTUIsJldFm5m",
        "cellView": "form"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### **Step 2: Download the audio, split it (if selected) and launch RVC WebUI**"
      ],
      "metadata": {
        "id": "vpGDWwsjsPEJ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### **2.1. Download Youtube Audio and Split it**"
      ],
      "metadata": {
        "id": "ASJC2S6kHBzX"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Mount Google Drive\n",
        "\n",
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ],
      "metadata": {
        "id": "jwu07JgqoFON"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Install libraries for Youtube WAV download\n",
        "if dataset == \"Drive\":\n",
        "  print(\"Dataset is set to Drive. Skipping this section\")\n",
        "elif dataset == \"Youtube\":\n",
        "  !pip install yt_dlp\n",
        "  !pip install ffmpeg\n",
        "  !mkdir youtubeaudio"
      ],
      "metadata": {
        "id": "KSYral95Plvb"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Download audio\n",
        "from __future__ import unicode_literals\n",
        "\n",
        "if dataset == \"Drive\":\n",
        "  print(\"Dataset is set to Drive. Skipping this section\")\n",
        "elif dataset == \"Youtube\":\n",
        "  import yt_dlp\n",
        "  import ffmpeg\n",
        "  import sys\n",
        "\n",
        "\n",
        "  ydl_opts = {\n",
        "      'format': 'bestaudio/best',\n",
        "  #    'outtmpl': 'output.%(ext)s',\n",
        "      'postprocessors': [{\n",
        "          'key': 'FFmpegExtractAudio',\n",
        "          'preferredcodec': 'wav',\n",
        "      }],\n",
        "      \"outtmpl\": f'youtubeaudio/{AUDIO_NAME}',  # this is where you can edit how you'd like the filenames to be formatted\n",
        "  }\n",
        "  def download_from_url(url):\n",
        "      ydl.download([url])\n",
        "      # stream = ffmpeg.input('output.m4a')\n",
        "      # stream = ffmpeg.output(stream, 'output.wav')\n",
        "\n",
        "\n",
        "  with yt_dlp.YoutubeDL(ydl_opts) as ydl:\n",
        "\n",
        "        download_from_url(url)"
      ],
      "metadata": {
        "id": "RYfXSP_QPoJ2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Install Demucs for separating audio\n",
        "!python3 -m pip install -U demucs"
      ],
      "metadata": {
        "id": "3fIc2i15SKBN",
        "cellView": "form"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Separate vocal and instrument/noise using Demucs\n",
        "import subprocess\n",
        "AUDIO_INPUT = f\"/content/youtubeaudio/{AUDIO_NAME}.wav\"\n",
        "\n",
        "if dataset == \"Drive\":\n",
        "  command = f\"demucs --two-stems=vocals {path_audio}\"\n",
        "elif dataset == \"Youtube\":\n",
        "  command = f\"demucs --two-stems=vocals {AUDIO_INPUT}\"\n",
        "\n",
        "result = subprocess.run(command.split(), stdout=subprocess.PIPE)\n",
        "print(result.stdout.decode())"
      ],
      "metadata": {
        "id": "_YS1PLGmSQ40"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Copy downloaded audio to Google Drive\n",
        "!mkdir -p /content/drive/MyDrive/audio/{AUDIO_NAME}\n",
        "!cp -r /content/separated/htdemucs/{AUDIO_NAME}/* /content/drive/MyDrive/audio/{AUDIO_NAME}\n",
        "if dataset == \"Youtube\":\n",
        "  !cp -r /content/youtubeaudio/{AUDIO_NAME}.wav /content/drive/MyDrive/audio/{AUDIO_NAME}"
      ],
      "metadata": {
        "id": "iXRfuu9YTUPu"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Install library for audio splitting\n",
        "if AudioMode == \"Separate\":\n",
        "    print(\"Mode is set to Separate. Skipping this section\")\n",
        "elif AudioMode ==  \"Splitting\":\n",
        "  !pip install numpy\n",
        "  !pip install librosa\n",
        "  !pip install soundfile\n",
        "  !mkdir -p dataset/{AUDIO_NAME}"
      ],
      "metadata": {
        "id": "4z_C02unP5S2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Split audio to smaller duration\n",
        "import numpy as np\n",
        "import librosa\n",
        "import soundfile\n",
        "\n",
        "\n",
        "# This function is obtained from librosa.\n",
        "def get_rms(\n",
        "    y,\n",
        "    *,\n",
        "    frame_length=2048,\n",
        "    hop_length=512,\n",
        "    pad_mode=\"constant\",\n",
        "):\n",
        "    padding = (int(frame_length // 2), int(frame_length // 2))\n",
        "    y = np.pad(y, padding, mode=pad_mode)\n",
        "\n",
        "    axis = -1\n",
        "    # put our new within-frame axis at the end for now\n",
        "    out_strides = y.strides + tuple([y.strides[axis]])\n",
        "    # Reduce the shape on the framing axis\n",
        "    x_shape_trimmed = list(y.shape)\n",
        "    x_shape_trimmed[axis] -= frame_length - 1\n",
        "    out_shape = tuple(x_shape_trimmed) + tuple([frame_length])\n",
        "    xw = np.lib.stride_tricks.as_strided(\n",
        "        y, shape=out_shape, strides=out_strides\n",
        "    )\n",
        "    if axis < 0:\n",
        "        target_axis = axis - 1\n",
        "    else:\n",
        "        target_axis = axis + 1\n",
        "    xw = np.moveaxis(xw, -1, target_axis)\n",
        "    # Downsample along the target axis\n",
        "    slices = [slice(None)] * xw.ndim\n",
        "    slices[axis] = slice(0, None, hop_length)\n",
        "    x = xw[tuple(slices)]\n",
        "\n",
        "    # Calculate power\n",
        "    power = np.mean(np.abs(x) ** 2, axis=-2, keepdims=True)\n",
        "\n",
        "    return np.sqrt(power)\n",
        "\n",
        "\n",
        "class Slicer:\n",
        "    def __init__(self,\n",
        "                 sr: int,\n",
        "                 threshold: float = -40.,\n",
        "                 min_length: int = 5000,\n",
        "                 min_interval: int = 300,\n",
        "                 hop_size: int = 20,\n",
        "                 max_sil_kept: int = 5000):\n",
        "        if not min_length >= min_interval >= hop_size:\n",
        "            raise ValueError('The following condition must be satisfied: min_length >= min_interval >= hop_size')\n",
        "        if not max_sil_kept >= hop_size:\n",
        "            raise ValueError('The following condition must be satisfied: max_sil_kept >= hop_size')\n",
        "        min_interval = sr * min_interval / 1000\n",
        "        self.threshold = 10 ** (threshold / 20.)\n",
        "        self.hop_size = round(sr * hop_size / 1000)\n",
        "        self.win_size = min(round(min_interval), 4 * self.hop_size)\n",
        "        self.min_length = round(sr * min_length / 1000 / self.hop_size)\n",
        "        self.min_interval = round(min_interval / self.hop_size)\n",
        "        self.max_sil_kept = round(sr * max_sil_kept / 1000 / self.hop_size)\n",
        "\n",
        "    def _apply_slice(self, waveform, begin, end):\n",
        "        if len(waveform.shape) > 1:\n",
        "            return waveform[:, begin * self.hop_size: min(waveform.shape[1], end * self.hop_size)]\n",
        "        else:\n",
        "            return waveform[begin * self.hop_size: min(waveform.shape[0], end * self.hop_size)]\n",
        "\n",
        "    def slice(self, waveform):\n",
        "        if len(waveform.shape) > 1:\n",
        "            samples = waveform.mean(axis=0)\n",
        "        else:\n",
        "            samples = waveform\n",
        "        if samples.shape[0] <= self.min_length:\n",
        "            return [waveform]\n",
        "        rms_list = get_rms(y=samples, frame_length=self.win_size, hop_length=self.hop_size).squeeze(0)\n",
        "        sil_tags = []\n",
        "        silence_start = None\n",
        "        clip_start = 0\n",
        "        for i, rms in enumerate(rms_list):\n",
        "            # Keep looping while frame is silent.\n",
        "            if rms < self.threshold:\n",
        "                # Record start of silent frames.\n",
        "                if silence_start is None:\n",
        "                    silence_start = i\n",
        "                continue\n",
        "            # Keep looping while frame is not silent and silence start has not been recorded.\n",
        "            if silence_start is None:\n",
        "                continue\n",
        "            # Clear recorded silence start if interval is not enough or clip is too short\n",
        "            is_leading_silence = silence_start == 0 and i > self.max_sil_kept\n",
        "            need_slice_middle = i - silence_start >= self.min_interval and i - clip_start >= self.min_length\n",
        "            if not is_leading_silence and not need_slice_middle:\n",
        "                silence_start = None\n",
        "                continue\n",
        "            # Need slicing. Record the range of silent frames to be removed.\n",
        "            if i - silence_start <= self.max_sil_kept:\n",
        "                pos = rms_list[silence_start: i + 1].argmin() + silence_start\n",
        "                if silence_start == 0:\n",
        "                    sil_tags.append((0, pos))\n",
        "                else:\n",
        "                    sil_tags.append((pos, pos))\n",
        "                clip_start = pos\n",
        "            elif i - silence_start <= self.max_sil_kept * 2:\n",
        "                pos = rms_list[i - self.max_sil_kept: silence_start + self.max_sil_kept + 1].argmin()\n",
        "                pos += i - self.max_sil_kept\n",
        "                pos_l = rms_list[silence_start: silence_start + self.max_sil_kept + 1].argmin() + silence_start\n",
        "                pos_r = rms_list[i - self.max_sil_kept: i + 1].argmin() + i - self.max_sil_kept\n",
        "                if silence_start == 0:\n",
        "                    sil_tags.append((0, pos_r))\n",
        "                    clip_start = pos_r\n",
        "                else:\n",
        "                    sil_tags.append((min(pos_l, pos), max(pos_r, pos)))\n",
        "                    clip_start = max(pos_r, pos)\n",
        "            else:\n",
        "                pos_l = rms_list[silence_start: silence_start + self.max_sil_kept + 1].argmin() + silence_start\n",
        "                pos_r = rms_list[i - self.max_sil_kept: i + 1].argmin() + i - self.max_sil_kept\n",
        "                if silence_start == 0:\n",
        "                    sil_tags.append((0, pos_r))\n",
        "                else:\n",
        "                    sil_tags.append((pos_l, pos_r))\n",
        "                clip_start = pos_r\n",
        "            silence_start = None\n",
        "        # Deal with trailing silence.\n",
        "        total_frames = rms_list.shape[0]\n",
        "        if silence_start is not None and total_frames - silence_start >= self.min_interval:\n",
        "            silence_end = min(total_frames, silence_start + self.max_sil_kept)\n",
        "            pos = rms_list[silence_start: silence_end + 1].argmin() + silence_start\n",
        "            sil_tags.append((pos, total_frames + 1))\n",
        "        # Apply and return slices.\n",
        "        if len(sil_tags) == 0:\n",
        "            return [waveform]\n",
        "        else:\n",
        "            chunks = []\n",
        "            if sil_tags[0][0] > 0:\n",
        "                chunks.append(self._apply_slice(waveform, 0, sil_tags[0][0]))\n",
        "            for i in range(len(sil_tags) - 1):\n",
        "                chunks.append(self._apply_slice(waveform, sil_tags[i][1], sil_tags[i + 1][0]))\n",
        "            if sil_tags[-1][1] < total_frames:\n",
        "                chunks.append(self._apply_slice(waveform, sil_tags[-1][1], total_frames))\n",
        "            return chunks\n",
        "\n",
        "if AudioMode == \"Separate\":\n",
        "    print(\"Mode is set to Separate. Skipping this section\")\n",
        "\n",
        "elif AudioMode ==  \"Splitting\":\n",
        "  audio, sr = librosa.load(f'/content/separated/htdemucs/{AUDIO_NAME}/vocals.wav', sr=None, mono=False)  # Load an audio file with librosa.\n",
        "  slicer = Slicer(\n",
        "      sr=sr,\n",
        "      threshold=-40,\n",
        "      min_length=5000,\n",
        "      min_interval=200,\n",
        "      hop_size=10,\n",
        "      max_sil_kept=500\n",
        "  )\n",
        "  chunks = slicer.slice(audio)\n",
        "  for i, chunk in enumerate(chunks):\n",
        "      if len(chunk.shape) > 1:\n",
        "          chunk = chunk.T  # Swap axes if the audio is stereo.\n",
        "      soundfile.write(f'/content/dataset/{AUDIO_NAME}/split_{i}.wav', chunk, sr)  # Save sliced audio files with soundfile."
      ],
      "metadata": {
        "id": "w68iU7zQQYjA"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Copy the splited audio to Google Drive\n",
        "if AudioMode == \"Separate\":\n",
        "    print(\"Mode is set to Separate. Skipping this section\")\n",
        "elif AudioMode ==  \"Splitting\":\n",
        "  !mkdir -p /content/drive/MyDrive/dataset/{AUDIO_NAME}\n",
        "  !cp -r /content/dataset/{AUDIO_NAME}/* /content/drive/MyDrive/dataset/{AUDIO_NAME}"
      ],
      "metadata": {
        "id": "U5c8uqCrUE3B"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### **2.2. Firing up RVC Training V2**"
      ],
      "metadata": {
        "id": "qy7RhL46JEZx"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Mount Google Drive (just in case)\n",
        "\n",
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ],
      "metadata": {
        "id": "zGG7d7RyKzWh"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Cloning Github\n",
        "\n",
        "!git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git\n",
        "!cp -r /content/drive/MyDrive/RVC/weights/* /content/Retrieval-based-Voice-Conversion-WebUI/assets/weights\n",
        "!mkdir -p /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}\n",
        "!cp -r /content/drive/MyDrive/RVC/{MODELNAME}/* /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}\n",
        "%cd /content/Retrieval-based-Voice-Conversion-WebUI"
      ],
      "metadata": {
        "id": "ge_97mfpgqTm"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Install library for RVC\n",
        "!pip install pip==23.3.1\n",
        "!pip install pyworld==0.3.2\n",
        "!pip install -r requirements.txt"
      ],
      "metadata": {
        "id": "lxJmBrEF8xGB"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Install aria2\n",
        "!apt -y install -qq aria2"
      ],
      "metadata": {
        "id": "pqE0PrnuRqI2"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Download Pretrained Model for Training\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o D32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o D40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o D48k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o G32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o G40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o G48k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o f0D32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o f0D40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o f0D48k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o f0G32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o f0G40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained -o f0G48k.pth\n",
        "\n",
        "#RVC V2\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o D32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o D40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o D48k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o G32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o G40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o G48k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o f0D32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o f0D40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o f0D48k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o f0G32k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o f0G40k.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/pretrained_v2 -o f0G48k.pth"
      ],
      "metadata": {
        "id": "UG3XpUwEomUz"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Download Pretrained Model for Audio Splitting\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2-人声vocals+非人声instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o HP2-人声vocals+非人声instrumentals.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5-主旋律人声vocals+其他instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o HP5-主旋律人声vocals+其他instrumentals.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/5_HP-Karaoke-UVR.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o 5_HP-Karaoke-UVR.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2_all_vocals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o HP2_all_vocals.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP3_all_vocals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o HP3_all_vocals.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5_only_main_vocal.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o HP5_only_main_vocal.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/VR-DeEchoAggressive.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o VR-DeEchoAggressive.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/VR-DeEchoDeReverb.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o VR-DeEchoDeReverb.pth\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/VR-DeEchoNormal.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/uvr5_weights -o VR-DeEchoNormal.pth\n"
      ],
      "metadata": {
        "id": "HugjmZqZRuiF"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Download hubert_base and rmvpe\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/hubert -o hubert_base.pt\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/rmvpe.pt -d /content/Retrieval-based-Voice-Conversion-WebUI/assets/rmvpe -o rmvpe.pt"
      ],
      "metadata": {
        "id": "2RCaT9FTR0ej"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Run Web\n",
        "%cd /content/Retrieval-based-Voice-Conversion-WebUI\n",
        "# %load_ext tensorboard\n",
        "# %tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs\n",
        "!python3 infer-web.py --colab --pycmd python3"
      ],
      "metadata": {
        "id": "7vh6vphDwO0b"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#@title - Copy Training Result to Drive\n",
        "if RVCMode == \"training\":\n",
        "  !mkdir -p /content/drive/MyDrive/RVC/{MODELNAME}\n",
        "  !mkdir -p /content/drive/MyDrive/RVC/weights\n",
        "\n",
        "  !cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_*.pth /content/drive/MyDrive/RVC/{MODELNAME}\n",
        "  !cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_*.pth /content/drive/MyDrive/RVC/{MODELNAME}\n",
        "  !cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/added_*.index /content/drive/MyDrive/RVC/{MODELNAME}\n",
        "  !cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/trained*.index /content/drive/MyDrive/RVC/{MODELNAME}\n",
        "  !cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/total_*.npy /content/drive/MyDrive/RVC/{MODELNAME}\n",
        "\n",
        "  !cp /content/Retrieval-based-Voice-Conversion-WebUI/assets/weights/{MODELNAME}.pth /content/drive/MyDrive/RVC/weights/{MODELNAME}.pth\n",
        "elif RVCMode == \"inference\":\n",
        "  print(\"Mode set to inference. Skipping this section\")"
      ],
      "metadata": {
        "id": "FgJuNeAwx5Y_"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}